Nanyang Technological University Speaker Diarization in Meetings Domain

نویسنده

  • Nguyen Trung Hieu
چکیده

The purpose of this study is to develop robust techniques for speaker segmentation and clustering with focus on meetings domain. The techniques examined can however be applied to any other domains. Traditional techniques for speaker diarization developed for telephone conversations or broadcast news are based on a single channel input, which is notably different from meetings domain which can have multiple channel inputs. These techniques when adapted to meetings domain however perform poorer than expected since they do not exploit direction of arrival information, which is available with the presence of multiple microphones. Moreover, many of these techniques are coupled with tunable parameters that need to be manually adjusted for each data set accordingly. In this thesis, our focus is on robust and accurate speaker diarization techniques in meetings. Our aim is to improve the segmentation and clustering performance in diverse conditions while keeping the number of manually tuned parameters to minimal. A comparative study of various distance metrics is first carried out for the purpose of finding the most appropriate metric to use for speaker segmentation and clustering. Our proposed metric is shown to outperform other popular metrics as demonstrated in a speaker verification task. It can be seen as an extension of cross likelihood ratio (CLR) by exploiting the second-order statistic. As an advantage, the computation of this metric is much faster than that of Bayesian Information Criterion (BIC). This work also proposes novel cluster validation methods to determine the optimal number of speakers. In our methods, we define the seperation between sets of intracluster distances and inter-cluster distances as the clustering quality and two metrics are proposed to measure this seperation. For each hypothesized number of speakers, the clustering quality is computed and the maximum value is corresponding to the optimal

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio-Video Speaker Diarization for Unsupervised Speaker and Face Model Creation

Our goal is to create speaker models in audio domain and face models in video domain from a set of videos in an unsupervised manner. Such models can be used later for speaker identification in audio domain (answering the question ”Who was speaking and when”) and/or for face recognition (”Who was seen and when”) for given videos that contain speaking persons. The proposed system is based on an a...

متن کامل

New bilingual speech databases for audio diarization

This paper describes the process of collecting and recording two new bilingual speech databases in Spanish and Basque. They are designed primarily for speaker diarization in two different application domains: broadcast news audio and recorded meetings. First, both databases have been manually segmented. Next, several diarization experiments have been carried out in order to evaluate them. Our b...

متن کامل

Audio Segmentation for Meetings Speech Processing

Audio Segmentation for Meetings Speech Processing by Kofi Agyeman Boakye Doctor of Philosophy in Engineering—Electrical Engineering and Computer Sciences University of California, Berkeley Professor Nelson Morgan, Chair Perhaps more than any other domain, meetings represent a rich source of content for spoken language research and technology. Two common (and complementary) forms of meeting spee...

متن کامل

Speaker Diarization in Meetings Domain

The purpose of this study is to develop robust techniques for speaker segmentation and clustering with focus on meetings domain. The techniques examined can however be applied to any other domains such as telephone and broadcast news. Traditional techniques for speaker diarization developed for telephone conversations or broadcast news are based on a single channel, which is notably different f...

متن کامل

Université Paris Xi Ufr Scientifique D'orsay Le Grade De Docteur En Sciences De L'université Paris Xi Orsay Sujet : Acoustic-based Speaker Diarization

This thesis presents a work focusing on the topic of speaker diarization for different types of audio recordings, especially including broadcast news (BN) and meetings. The speaker diarization is a relatively recent speech processing technique, but it has attracted strong research efforts due to its great benefit to other speech technologies, such as rich transcription, audio indexing and speak...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009